The GNAT library for local and remote gene mention normalization
نویسندگان
چکیده
SUMMARY Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the Gnat Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of Gnat achieves a Tap-20 score of 0.1987. AVAILABILITY The library and web services are implemented in Java and the sources are available from http://gnat.sourceforge.net. CONTACT [email protected].
منابع مشابه
Gene mention normalization in full texts using GNAT and LINNAEUS
Gene mention normalization (GN) refers to the automated mapping of gene names to a unique identifier, such as an NCBI Entrez Gene ID. Such knowledge helps in indexing and retrieval, linkage to additional information (such as sequences), database curation, and data integration. We present here an ensemble system encompassing LINNAEUS for recognizing organism names and GNAT for recognition and no...
متن کاملInter-species normalization of gene mentions with GNAT
MOTIVATION Text mining in the biomedical domain aims at helping researchers to access information contained in scientific publications in a faster, easier and more complete way. One step towards this aim is the recognition of named entities and their subsequent normalization to database identifiers. Normalization helps to link objects of potential interest, such as genes, to detailed informatio...
متن کاملSupporting Ada 95 Passive Partitions in a Distributed Environment
Ada 95 passive partitions, containing passive library units, provide the means to distribute data within a network of workstations. This paper shows how passive partitions can be implemented via distributed shared virtual memory (DSM). DSM provides the logical view of a portion of memory shared between physically distributed workstations within a network. In this paper, we relate design issues ...
متن کاملAn Open Ravenscar Real-Time Kernel for GNAT
This paper describes the architecture of ORK, an open source realtime kernel that implements the Ravenscar profile for the GNAT compilation system on a bare ERC32 computer. The kernel has a reduced size and complexity, and has been carefully designed in order to make it possible to build reliable software for on-board space applications. The kernel is closely integrated with the GNAT runtime li...
متن کاملReal-Time Programming with GNAT: Specialised Kernels versus POSIX Threads1
The fact that most of the GNAT ports are based on non real-time operating systems leads to a reduced usability for developing real-time systems. Otherwise, existing ports over real-time operating systems are excesively complex, since GNAT uses only a reduced set of their functionality, and with a very specific semantic. This paper describes the implementation of a low-level tasking support for ...
متن کامل